Exploratory Data Analysis of Crop Yields and Applications of Pesticides¶

-By Fahmid Hasan (B.Sc.Ag(Hons), Bangaldesh Agricutural University, Mymensingh)¶

Importing all necessary libraries¶

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
from scipy.stats import pearsonr
pio.renderers.default = 'notebook'  

Loading the Data¶

In [2]:
pesticides=pd.read_csv('pesticides.csv')
yield_df=pd.read_csv('yield_df.csv')

A quick look at the data sets¶

This data set contains 4349 entries with 7 columns¶

In [3]:
pesticides.sample(10)
Out[3]:
Domain Area Element Item Year Unit Value
2842 Pesticides Use Nicaragua Use Pesticides (total) 1996 tonnes of active ingredients 2041.70
3283 Pesticides Use Russian Federation Use Pesticides (total) 2009 tonnes of active ingredients 25961.00
2858 Pesticides Use Nicaragua Use Pesticides (total) 2012 tonnes of active ingredients 4413.82
1181 Pesticides Use Denmark Use Pesticides (total) 1994 tonnes of active ingredients 3922.00
347 Pesticides Use Belarus Use Pesticides (total) 1992 tonnes of active ingredients 8306.31
3518 Pesticides Use Slovenia Use Pesticides (total) 1992 tonnes of active ingredients 1309.00
2424 Pesticides Use Malaysia Use Pesticides (total) 1994 tonnes of active ingredients 39406.48
3500 Pesticides Use Slovakia Use Pesticides (total) 1999 tonnes of active ingredients 2831.90
560 Pesticides Use Brazil Use Pesticides (total) 2016 tonnes of active ingredients 377176.00
458 Pesticides Use Bhutan Use Pesticides (total) 1995 tonnes of active ingredients 15.70
In [4]:
pesticides.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4349 entries, 0 to 4348
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Domain   4349 non-null   object 
 1   Area     4349 non-null   object 
 2   Element  4349 non-null   object 
 3   Item     4349 non-null   object 
 4   Year     4349 non-null   int64  
 5   Unit     4349 non-null   object 
 6   Value    4349 non-null   float64
dtypes: float64(1), int64(1), object(5)
memory usage: 238.0+ KB

This Data set contains 28242 entries with 7 columns¶

In [5]:
yield_df.sample(10)
Out[5]:
Unnamed: 0 Area Item Year hg/ha_yield average_rain_fall_mm_per_year pesticides_tonnes avg_temp
21201 21201 Netherlands Wheat 2007 72714 778.0 12073.15 11.04
13438 13438 India Soybeans 2007 12351 1083.0 27422.77 26.27
16553 16553 Japan Wheat 1998 35111 1668.0 79821.18 16.45
17770 17770 Lesotho Potatoes 1991 156250 788.0 0.30 14.66
21047 21047 Nepal Potatoes 1992 85916 1500.0 60.11 14.95
18884 18884 Mauritius Cassava 1992 140000 2041.0 687.86 23.81
12211 12211 India Soybeans 1999 11380 1083.0 46195.00 25.97
5331 5331 Burundi Maize 2012 11763 1274.0 98.59 21.44
13602 13602 India Sorghum 2008 10209 1083.0 14485.33 26.83
14586 14586 Indonesia Cassava 1991 120946 2702.0 3259.00 25.27
In [6]:
yield_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28242 entries, 0 to 28241
Data columns (total 8 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Unnamed: 0                     28242 non-null  int64  
 1   Area                           28242 non-null  object 
 2   Item                           28242 non-null  object 
 3   Year                           28242 non-null  int64  
 4   hg/ha_yield                    28242 non-null  int64  
 5   average_rain_fall_mm_per_year  28242 non-null  float64
 6   pesticides_tonnes              28242 non-null  float64
 7   avg_temp                       28242 non-null  float64
dtypes: float64(3), int64(3), object(2)
memory usage: 1.7+ MB

Exploratory Data Analysis and Visualization (Interactive Charts) using Plotly¶

Topics:¶

  1. Top ten Countries used the highest amount of pesticides in the year of 2016

  2. Pesticide usage in the number one country.

  3. Application of pesticides in Bangladesh from 1990 to 2016.

  4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)

  5. Pesticide application and the Yields of some dominant crops in Bangladesh.

  6. No.4 Analysis of Rice (The main dominating crop in Bangladesh)

  7. Uneven distribution of Yields and Pesticide application in some past Years

  8. Some Basic Analysis.

1. Top ten Countries used the highest amount of pesticides in the year of 2016¶

In [7]:
ten_pesticides_country=pesticides[pesticides.Year==2016][['Area','Value']].sort_values(by='Value',ascending=False).head(10)
In [8]:
	
fig=px.bar(ten_pesticides_country,
           y='Area',
           x='Value',
           title='Top ten countries which used the highest amount of pesticides in 2016',
           color=ten_pesticides_country.Area)
fig.update_layout(
    yaxis_title='Country',
    xaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

China(Main land) used the highest amount of Pesticides in the year of 2016¶

2. Pesticides Application in China (Different regions) from 1990 to 2016¶

In [9]:
pesticides_china=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China, mainland', 'China, Taiwan Province of'])][['Area','Year','Value']]
In [10]:
fig=px.bar(
    pesticides_china,
    x='Year',
    y='Value',
    color='Area',
    title='Pesticides Usage in China from 1990 to 2016',
)
fig.update_layout(
    yaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

Mainland of China used the highest amount of Pesticides compared to other regions¶

2.1 Other regions of China¶

In [11]:
pest_china_other_area=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China'])]
In [12]:
	
fig=px.bar(
    pest_china_other_area,
    x='Year',
    y='Value',
    color='Area',
    title='Pesticides usage in China (Hong Kong SAR, Macao SAR)'
)
fig.update_layout(
    yaxis_title='Tonnes of Active Ingredients',
    width=1200,
    height=700
)
fig.show()

Here Macao SAR used more pesticide than Hong Kong SAR over the past years¶

3. Application of pesticides and Crops Yield in Bangladesh from 1990 to 2013.¶

In [13]:
yield_df=yield_df.drop(columns='Unnamed: 0',inplace=False,axis=1)
yield_df['hg/ha_yield']=yield_df['hg/ha_yield']/10000
yield_df=yield_df.rename(columns={'hg/ha_yield':'ton/ha_yield'})
yield_bd=yield_df[yield_df.Area=='Bangladesh']
In [14]:
yield_pest_bd=yield_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [15]:
fig=px.bar(
    yield_pest_bd,
    x=yield_pest_bd.index,
    y='pesticides_tonnes',
    color='ton/ha_yield',
    title='Pesticide Application and Crops Yield from 1990 to 2013 in BD'
)

fig.update_layout(yaxis_title='Pesticide Dose (Tonnes)',
                  width=1200,
                  height=700)
fig.show()

The application of pesticides in BD has increased exponentially over the past years (From 1990 to 2013)¶

4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)¶

In [41]:
fig=px.scatter(yield_df.sample(2000),
               x='pesticides_tonnes',
               y='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item','Year'],
               hover_name='Area',
               title='Realtionship between Crops Yield and Pesticide usage'
               )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()
In [17]:
correlation_AVG= yield_df[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_AVG)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield           1.000000           0.064085
pesticides_tonnes      0.064085           1.000000

Some countries use less amount of pesticides and get higher Crop yields and vice versa. For example Brazil uses a significant amount of pesticides, but tonne/ha yield is not that much compared to Switzerland, Germany, the United Kingdom, etc. also the correlation coefficient between ton/ha_yield and pesticides_tonnes is approximately 0.0641. Which is very close to 0, indicating a very weak positive linear relationship between pesticide usage and crop yield¶

4.1 Correlation Analysis In case of Bangladesh¶

In [18]:
fig=px.scatter(yield_bd,
               x='pesticides_tonnes',
               y='ton/ha_yield',
               size='pesticides_tonnes',
               color='Item',
               hover_data='Year',
               hover_name='Item',
               title='Realtionship between Crops Yield and Pesticide usage (Bangladesh)')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700)
fig.show()
In [19]:
correlation_BD= yield_bd[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_BD)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield            1.00000            0.14386
pesticides_tonnes       0.14386            1.00000

Same goes here, the correlation coefficient is 0.14386 Which is close to 0, indicating a weak positive linear relationship between pesticide usage and crop yield.¶

5. Pesticide application and the Yields of some dominant crops in Bangladesh.¶

In [20]:
fig=px.line(yield_bd,
            x='pesticides_tonnes',
            y=yield_bd['ton/ha_yield'],
            hover_data=['Year'],
            color='Item',
            title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

Potatoes tonne/ha yield is very high compared to other major crops. There was some sort of uneven distribution of pesticides and the yield of crops from 2008 to 2012¶

6. Analysis for Rice (The main dominating crop in Bangladesh)¶

In [21]:
yield_bd_rice=yield_bd[yield_bd.Item=='Rice, paddy']
In [22]:
fig=px.line(yield_bd_rice,
            x='pesticides_tonnes',
            y='ton/ha_yield',
            title='Effects of pesticides usage on Rice yield in Bangladesh from 1990 to 2013',
            hover_data=['Year']
            )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

7. Uneven distribution of Crop Yields and Pesticide application in some of the past Years (2008 to 20012)¶

In [23]:
yield_8_9_10_11_12=yield_df[yield_df.Year.isin(range(2008,2013))]
yield_8_9_10_11_12_bd=yield_8_9_10_11_12[yield_8_9_10_11_12.Area=='Bangladesh']
yield_pests_8to12=yield_8_9_10_11_12_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [24]:
fig=px.bar(yield_pests_8to12,y='pesticides_tonnes',
            x=yield_pests_8to12.index,
            title='Pesticides usage and yield of crops from 2008 to 20012',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides Dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
)
fig.show()

Both the application of pesticide and yield of crops were unevenly distributed. There could be some problem with the Data entry in those years¶

7.1. Distribution in some of the previous years (2000 to 2007)¶

In [25]:
yield_2000to2007=yield_df[yield_df.Year.isin([2000,2001,2002,2004,2005,2006,2007])]
yield_2000to2007bd=yield_2000to2007[yield_2000to2007.Area=='Bangladesh']
yield_2000to2007bd=yield_2000to2007bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [26]:
fig=px.bar(yield_2000to2007bd,y='pesticides_tonnes',
            x=yield_2000to2007bd.index,
            title='Pesticides usage from 2000 to 2007 in BD',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
    )

However distribution ascended gradually from 2000 to 2007¶

8. Some Basic analysis¶

8.1 Top ten maize Producing Countries in 2013¶

In [27]:
crop_count=yield_df.groupby('Area')['ton/ha_yield'].sum()
maize_df=yield_df[yield_df.Item=='Maize'][['Area','Year','Item','ton/ha_yield']].sort_values(by='ton/ha_yield',ascending=False)
maize_df_13=maize_df[maize_df.Year==2013]
top_ten_maize_df_13=maize_df_13.groupby('Area')['ton/ha_yield'].sum().sort_values(ascending=False).head(10)
In [28]:
fig=px.bar(
    top_ten_maize_df_13,
    x=top_ten_maize_df_13.index,
    y=top_ten_maize_df_13,
    title='Top ten Maize producing countries in 2013',
    color=top_ten_maize_df_13.index
)
fig.update_layout(
    xaxis_title='Country',
    yaxis_title='Yield (Tonne/Ha)',
    width=1200,
    height=700
)
fig.show()

8.2 Top ten Rice Producing Countries in 2013¶

In [29]:
rice_df=yield_df[yield_df.Item=='Rice, paddy']
rice_df=rice_df[rice_df.Year==2013]
rice_df
top_ten_rice_df_13=rice_df.groupby('Area')['ton/ha_yield'].sum()
top_ten_rice_df_13=top_ten_rice_df_13.sort_values(ascending=False)
top_ten_rice_df_13=top_ten_rice_df_13.head(10)
In [30]:
fig=px.bar(
    top_ten_rice_df_13,
    x=top_ten_rice_df_13.index,
    y=top_ten_rice_df_13,
    title='Top ten Rice producing countries in 2013',
    color=top_ten_rice_df_13.index
)
fig.update_layout(
    xaxis_title='Country',
    yaxis_title='Yield (Tonne/Ha)',
    width=1200,
    height=700
)
fig.show()

8.3 Ten highest crops (Maize, Potatoes, Rice, Sorghum, Soybeans, Wheat, Cassava, Sweet potatoes, Plantains and others, Yams) yielding countries from 1990 to 2013¶

In [31]:
highest_crops_yield=yield_df.groupby('Area')['ton/ha_yield'].sum()
highest_crops_yield=highest_crops_yield.sort_values(ascending=False)
ten_highest_crops_yield=highest_crops_yield.head(10)
In [32]:
fig=px.bar(
    ten_highest_crops_yield,
    x=ten_highest_crops_yield.index,
    y=ten_highest_crops_yield,
    title='Ten highest crop yielding Countries (1990 to 2013)',
    color=ten_highest_crops_yield.index
)

fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()